25 research outputs found
MM Algorithms for Minimizing Nonsmoothly Penalized Objective Functions
In this paper, we propose a general class of algorithms for optimizing an
extensive variety of nonsmoothly penalized objective functions that satisfy
certain regularity conditions. The proposed framework utilizes the
majorization-minimization (MM) algorithm as its core optimization engine. The
resulting algorithms rely on iterated soft-thresholding, implemented
componentwise, allowing for fast, stable updating that avoids the need for any
high-dimensional matrix inversion. We establish a local convergence theory for
this class of algorithms under weaker assumptions than previously considered in
the statistical literature. We also demonstrate the exceptional effectiveness
of new acceleration methods, originally proposed for the EM algorithm, in this
class of problems. Simulation results and a microarray data example are
provided to demonstrate the algorithm's capabilities and versatility.Comment: A revised version of this paper has been published in the Electronic
Journal of Statistic
Online Updating of Statistical Inference in the Big Data Setting
We present statistical methods for big data arising from online analytical
processing, where large amounts of data arrive in streams and require fast
analysis without storage/access to the historical data. In particular, we
develop iterative estimating algorithms and statistical inferences for linear
models and estimating equations that update as new data arrive. These
algorithms are computationally efficient, minimally storage-intensive, and
allow for possible rank deficiencies in the subset design matrices due to
rare-event covariates. Within the linear model setting, the proposed
online-updating framework leads to predictive residual tests that can be used
to assess the goodness-of-fit of the hypothesized model. We also propose a new
online-updating estimator under the estimating equation setting. Theoretical
properties of the goodness-of-fit tests and proposed estimators are examined in
detail. In simulation studies and real data applications, our estimator
compares favorably with competing approaches under the estimating equation
setting.Comment: Submitted to Technometric
Bayesian Modeling and Inference for Nonignorably Missing Longitudinal Binary Response Data with Applications to HIV Prevention Trials
Missing data are frequently encountered in longitudinal clinical trials. To better monitor and understand the progress over time, one must handle the missing data appropriately and examine whether the missing data mechanism is ignorable or nonignorable. In this article, we develop a new probit model for longitudinal binary response data. It resolves a challenging issue for estimating the variance of the random effects, and substantially improves the convergence and mixing of the Gibbs sampling algorithm. We show that when improper uniform priors are specified for the regression coefficients of the joint multinomial model via a sequence of one-dimensional conditional distributions for the missing data indicators under nonignorable missingness, the joint posterior distribution is improper. A variation of Jeffreys prior is thus established as a remedy for the improper posterior distribution. In addition, an efficient Gibbs sampling algorithm is developed using a collapsing technique. Two model assessment criteria, the deviance information criterion (DIC) and the logarithm of the pseudomarginal likelihood (LPML), are used to guide the choices of prior specifications and to compare the models under different missing data mechanisms. We report on extensive simulations conducted to investigate the empirical performance of the proposed methods. The proposed methodology is further illustrated using data from an HIV prevention clinical trial. © Institute of Statistical Science. All rights reserved
Online Updating of Survival Analysis
When large amounts of survival data arrive in streams, conventional estimation methods become computationally infeasible since they require access to all observations at each accumulation point. We develop online updating methods for carrying out survival analysis under the Cox proportional hazards model in an online-update framework. Our methods are also applicable with time-dependent covariates. Specifically, we propose online-updating estimators as well as their standard errors for both the regression coefficients and the baseline hazard function. Extensive simulation studies are conducted to investigate the empirical performance of the proposed estimators. A large colon cancer dataset from the Surveillance, Epidemiology, and End Results program and a large venture capital dataset with time-dependent covariates are analyzed to demonstrate the utility of the proposed methodologies. Supplemental files for this article are available online
A new Bayesian joint model for longitudinal count data with many zeros, intermittent missingness, and dropout with applications to HIV prevention trials
In longitudinal clinical trials, it is common that subjects may permanently withdraw from the study (dropout), or return to the study after missing one or more visits (intermittent missingness). It is also routinely encountered in HIV prevention clinical trials that there is a large proportion of zeros in count response data. In this paper, a sequential multinomial model is adopted for dropout and subsequently a conditional model is constructed for intermittent missingness. The new model captures the complex structure of missingness and incorporates dropout and intermittent missingness simultaneously. The model also allows us to easily compute the predictive probabilities of different missing data patterns. A zero-inflated Poisson mixed-effects regression model is assumed for the longitudinal count response data. We also propose an approach to assess the overall treatment effects under the zero-inflated Poisson model. We further show that the joint posterior distribution is improper if uniform priors are specified for the regression coefficients under the proposed model. Variations of the g-prior, Jeffreys prior, and maximally dispersed normal prior are thus established as remedies for the improper posterior distribution. An efficient Gibbs sampling algorithm is developed using a hierarchical centering technique. A modified logarithm of the pseudomarginal likelihood and a concordance based area under the curve criterion are used to compare the models under different missing data mechanisms. We then conduct an extensive simulation study to investigate the empirical performance of the proposed methods and further illustrate the methods using real data from an HIV prevention clinical trial
Exposure to secondhand smoke and asthma severity among children in Connecticut
<div><p>Objective</p><p>To determine whether secondhand smoke (SHS) exposure is associated with greater asthma severity in children with physician-diagnosed asthma living in CT, and to examine whether area of residence, race/ethnicity or poverty moderate the association.</p><p>Methods</p><p>A large childhood asthma database in CT (Easy Breathing) was linked by participant zip code to census data to classify participants by area of residence. Multinomial logistic regression models, adjusted for enrollment date, sex, age, race/ethnicity, area of residence, insurance type, family history of asthma, eczema, and exposure to dogs, cats, gas stove, rodents and cockroaches were used to examine the association between self-reported exposure to SHS and clinician-determined asthma severity (mild, moderate, and severe persistent vs. intermittent asthma).</p><p>Results</p><p>Of the 30,163 children with asthma enrolled in Easy Breathing, between 6 months and 18 years old, living in 161 different towns in CT, exposure to SHS was associated with greater asthma severity (adjusted relative risk ratio (aRRR): 1.07 [1.00, 1.15] and aRRR: 1.11 [1.02, 1.22] for mild and moderate persistent asthma, respectively). The odds of Black and Puerto Rican/Hispanic children with asthma being exposed to SHS were twice that of Caucasian children. Though the odds of SHS exposure for publicly insured children with asthma were three times greater than the odds for privately insured children (OR: 3.02 [2.84,3,21]), SHS exposure was associated with persistent asthma only among privately insured children (adjusted odds ratio (aOR): 1.23 [1.11,1.37]).</p><p>Conclusion</p><p>This is the first large-scale pragmatic study to demonstrate that children exposed to SHS in Connecticut have greater asthma severity, clinically determined using a systematic approach, and varies by insurance status.</p></div
The 5 Connecticut socioeconomic status (SES) categories demonstrating the equal share percentage (ESP) for family income and poverty as of 2000.
<p>Town of residence (by participant’s zip code) was classified according to the 5 Connecticuts study as urban core, urban periphery, suburban, wealthy, or rural as proxies for area of residence. These proxies were determined by combining town-level population density, median family income, and percent of residents living in poverty (defined as the percentage of population below the 100% poverty threshold) The equal share line (where ESP = 0%) marks where the share of a variable does not differ from the statewide average. <i>Adapted with permission from The Five Connecticuts report Figure 7</i>.</p